Fix a failing Pipeline Flow - Instrumentation & Observability
## **Problem**
We lack the instrumentation to answer two fundamental questions about the Fix Pipeline flow:
1. Is it delivering value to developers?
2. Is it behaving as designed?
Without these metrics, we cannot measure whether the flow is saving developer time or resolving failures effectively.
## **Proposal**
Build a clear, reusable instrumentation layer for the Fix CI/CD Pipeline with Duo flow that enables product, engineering, and data teams to measure adoption, performance, failure patterns, and step-level behavior — with an eye toward reusability across all Duo Workflow flows.
### Flow Impact Metrics
Why: Measure the tangible time savings the feature delivers to developers, providing a direct indicator of business value and ROI to justify continued investment.
| Priority | Metric | Target Visualization | Notes |
|----------|--------|----------------------|-------|
| HIGH | Average time from pipeline failure to green pipeline | Tableau | https://gitlab.com/gitlab-data/product-analytics/-/work_items/3211+ |
| HIGH | Fix Code Suggestion Acceptance Rate | Tableau | New internal event needed - gitlab#598452 |
| HIGH | GitLab Credits Used | Monetization Dashboard [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/BillableEvents?:iid=1) - SAFE | |
| | | | |
### Standard Flow Metrics
Why: Understand overall feature adoption and where users drop off to prioritize reliability and UX improvements.
| Priority | Metric | Target Visualization | Notes |
|----------|--------|----------------------|-------|
| High | Number of times Fix Pipeline Flow was triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana |
| High | Number of times Fix Pipeline Flow completed successfully | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana |
| High | Number of times Fix Pipeline flow Failed to complete | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana |
| Medium | Number of flows aborted by user | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | available in Kibana |
| High | Conversion rate: Number of flows that resulted in a fix/Number of flows triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | |
| High | Conversion rate: Number of flows that results in a comment/Number of flows triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | |
| High | Converstion rate: Number of flows that results in an auto-retry/Number of flows triggered | Tableau | |
| High | LLM calls per flow | Monetization Dashboard - [Tableau ](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/DuoAgentPlatformMonetizationInsights?:iid=1)(SAFE) | |
| High | Token consumption per flow | Monetization Dashboard -[Tableau](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/TokenConsumptionMetrics?:iid=1) (SAFE) | |
| High | Duration of the Flow | Product Adoption Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/AgenticAIProductAdoption/Overview?:iid=1) | |
### Failure Classification
Why: Categorize why flows fail to prioritize the highest-impact fixes and track whether the LLM is correctly scoping problems it can solve.
<table>
<tr>
<th>Priority</th>
<th>Metric</th>
<th>Visualization</th>
<th>Notes</th>
</tr>
<tr>
<td>Medium</td>
<td>Failure Reason/Category - Can we log this information based on the LLM reasoning</td>
<td>Tableau</td>
<td></td>
</tr>
<tr>
<td>Medium</td>
<td>
Commonly suggested fix
* Is it to retry the job, push a MR out, change the ci config
</td>
<td>Tableau</td>
<td></td>
</tr>
</table>
### Flow Step Level Metrics (to implement the above)
Why: Understand how the flow executes internally to identify bottlenecks, unexpected paths, and opportunities to improve the flow's decision-making.
| Priority | Metric | Visualization | Notes |
|----------|--------|---------------|-------|
| Low | Step Duration | Kibana | |
| Low | Step Status | Kibana | |
| Medium | Step Failure Reason | Kibana | |
### **Filtering / Segmentation Dimensions**
* GitLab project
* Trigger type (DAP automation vs. manual)
* Pipeline Source Type (Merge Request, Scheduled, Push etc.)
epic